Dynamics of the tuning process between singers. 
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We present a dynamical model describing a predictable human behavior like the tuning process 
between singers. The purpose, inspired in physiological and behavioral grounds of human beings, is 
sensitive to all Fourier spectrum of each sound emitted and it contemplates an asymmetric coupling 
between individuals. We have recorded several tuning exercises and we have confronted the experi- 
mental evidence with the results of the model finding a very well agreement between calculated and 
experimental sonograms. 
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Introduction. When we think in music, we commonly 
do it in relation to feelings and emotions arising from 
the sub-cortical limbic system of the brain 1]. However, 
music or sound perception is a very complex sequence of 
transductions, beginning with the input of pressure waves 
to the ear and ending with cognition operations devel- 
oped in the brain's external neocortex 2]. Consequently, 
an overall understanding of what music means in human 
beings requires physical, biological, neural, physiological 
and behavioral grounds 3j. In this letter we are going to 
focus on the tuning process between singers. The capabil- 
ity of human beings to sing in tune is strongly dependent 
on his natural conditions, training and previous experi- 
ence. Then, results of tuning experiments can be very 
different even for the same initial conditions. To avoid 
subjectivity we have restricted the possible solutions by 
imparting a clear watchword oriented to achieve tuning 
in the same note or in an octave. In this way we were 
able to analyze experimentally basic human behavior and 
consequently to propose a phenomenological mathemat- 
ical model describing it. While there are many works 
regarding synchronization 0, |3, 0: 13 (i- ^- phase adjust) 
this is, up to our knowledge, the first model that account 
for the evolution of spectra of frequencies interacting be- 
tween them. 

A musical note is a complex periodic oscillation that 
can be discomposed into a sum of sinusoidal excitations, 
the harmonics^ each one with a frequency multiple of a 
particular frequency called the fundamental. Then, if uoq 
is the fundamental frequency, the Fourier spectrum of a 
note is composed by peaks at cjq, 2cjo, 3cjo, etc. The 
pitch indicates how high or low is a particular note and 
is labeled with the value (or name) of the fundamental. 
The relative intensities which each harmonic participate 
in the sound define its timbre. 3] 

A noise^ in turn, is a sum of excitations without any 
relationship between the individual frequencies although 
the boundary between music and noise is subjective and 
one can listen to musicality in a given noise or find a 
noisy musical sound. In the same way the idea of con- 



sonance or dissonance is also a subjective, even cultural, 
concept. Nevertheless there are physiological reasons to 
understand the consonance: the medium ear contains a 
conduct with variable transversal section, the cochlea^ in- 
side which a stationary wave is formed. From the hydro- 
dynamical point of view, the cochlea is split-up in two 
channels separated by the basilar membrane. The differ- 
ences in pressure at both sides of the membrane produce 
deformations resulting in a resonance pattern detected by 
a series of thin receptors, the hair cells^ which are con- 
nected to neurons 2J. Thus, the electrical signal sent to 
the brain is in fact a transduction of the geometrical rep- 
resentation of the deformation of the basilar membrane. 
The set of nodes of the stationary wave is consistent with 
only one note (i.e., with only one Fourier spectrum) and 
then, two notes will be more consonant as more nodes in 
common they have. Ij Mathematically, the consonance is 
reflected in a simple ratio between the fundamental fre- 
quencies of each note. For example, if c^io and UO20 are the 
fundamentals of two notes, a sequence from consonance 
to dissonance is 0020/^10 = 1,2,3/2,4/3 etc. The inter- 
vals between uoiq and cc;2o are denominated the same note, 
octave, fifth, fourth, etc., respectively. For the purpose 
of this work, we define tuning as the process in which 
two or more sound emitters change their pitches in way 
of equaling all or part of their Fourier spectra. 

Looking for the model. In order to elaborate a mathe- 
matical model which represents the main features of the 
tuning process, we are going to extract basic ideas from 
some well-known responses of the auditory system and 
also from prototype experiments: 

(i) The interaction term have to be a function which 
goes to zero when the ratio between frequencies is a sim- 
ple fraction. In this way, we cover the physiological and 
mathematical grounds of consonance. 

(ii) Two complex tones with the same Fourier family 
but differing only in that one of them has the fundamen- 
tal missing, will be listened by the singer as the same 
pitch. This ability of human beings was characterized 
and explained through the concept of virtual pitch per- 



ception introduced by E. Terhardt|3, |9|. We have ver- 
ified this response doing several experiments requesting 
to the singer to tune a guitar's sound which was sequen- 
tially filtered in its lower harmonics. In consequence, 
the functional response should be proportional to all the 
spectrum more than a single frequency. 

(iii) The point of subjectivity of "how I listen to my 
partner and how predisposed I am to interact with him" 
can yields different final results of tuning exercises even 
for the same couple of singers and with the same initial 
condition. The model must contemplate this possibility. 

(iv) Finally, and with the aim to define terminology, 
it is useful to analyze a very simple tuning exercise: a 
singer is asked to maintain his pitch while the partner 
is moved until both are tuned. None of them know the 
initial note of the other. The sonogram (temporal evo- 
lution of the Fourier spectrum) of this exercise is shown 
in Fig 1. At the beginning there is a brief interval of 
about 0.2 s in which the singers locate their initial note 
then, an period of approximately one second lapses, and 
finally they effectively start the exercise. We interpret 
the first interval as the necessary time to accommodate 
the singing apparatus (vocal cords, resonators, air emis- 
sion, etc.) to produce a musical sound. The time spent 
in this action can be reduced with training. The second 
stage is necessary to perform the cognitive operation to 
listen to all the notes emitted and to take the decision to 
move the pitch up or down to tune. The remaining time 
is dedicated to feedback in order to achieve tuning. We 
are going to name this last stage as dynamical tuning. In 
the example of Fig. QJ and because the watchword im- 
parted, one of them acts like if he does not listen to the 
other. In other words, there is an asymmetric coupling 
between them. 



Keeping in mind these precepts, we propose the fol- 
lowing set of equations describing the coupled dynamical 
evolution of complex tones emitted by N singers: 
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FIG. 1: Sonogram corresponding to a singer tuning a note 
emitted and maintained by other singer. 



(1) 

ujj^ is the frequency of the fi^^ harmonic of the j^^ 
individual within the group, Ij^ is the relative intensity 
of the corresponding harmonic in the Fourier expansion 
defining the timbre of the sound, Kij is an off-diagonal 
matrix {Ka = 0) representing the effective relative mag- 
nitude of the coupling between pairs of singers and m is 
an integer constant. 

The sine argument is the responsible to drive the stabil- 
ity of the equations since when the relationship mcoj^/ujio 
is a integer, the temporal derivative goes to zero. This 
sine function is indeed the key point of the model. The 
condition for the roots works as the mathematical repre- 
sentation of the natural behavior of singers to maximize 
the coincidence of nodes in the resonance pattern of the 
basilar membrane. In this sense, the goodness of the sine- 
like interaction is independent of m since regardless the 
particular value of m, this functional response fits the 
requirements of point (i). In a more general approach, 
the set of Eqs. (1) should include a sum over m but, as 
we are going to see later, the trends of the experimen- 
tal records can be reproduced with only one family of 
772— like functions. 

The absolute magnitude of the coupling is given by all 
the right side of Eq. 1 and the product K x I defines the 
temporal scale of the process. A coupling proportional to 
all Ijj^ guarantees a response to the Fourier family more 
than a frequency in particular [point (ii)]. Kij can be 
thought in zero order approximation as a magnitude of 
the volume of the emission, but as this matrix is asym- 
metric in general {Kij ^ Kji)^ it can contemplate the 
alternative that one of the singers emits always the same 
pitch independently of the movement of the rest. Kij = 
means "the i-singer is not coupled with the j-singer" ei- 
ther because he does not listen to the group or he has 
decided not to change his pitch. By adopting different 
values for Kij we can obtain different final results for the 
same pair of singers and starting with the same notes 
respectively [point (iii)]. 

By construction, this model is oriented to describe the 
dynamical tuning, i. e., when the singers start to move 
their frequencies by interaction. 

Results and Discussion. The individuals selected to all 
the experiences were non-professional singers but most 
of them have or had some training in collective singing. 
We formed 24 pairs of singers and we recorded more 
than one hundred experiences. The exercises were sim- 
ple: firstly the initial note is indicated; each singer lis- 
tens to only his own note. Then, they simultaneously 



start and change their pitches until to find tuning. The 
watchword was ^Ho arrive to the same note" , which for 
a medium-trained singer covers the possibility to tune in 
an octave. This last alternative is more probable when 
the separation between the initial pitches is large and/or 
when we treat with a female-male couple. We have not 
taken into account those records in which the watchword 
was not properly understood. We also discarded records 
in which one of the singer is near to the limit of his range. 
The numerical resolution of the system of equa- 
tions was done through an one step solver based on a 
Dormand-Prince-Runge-Kutta formula lOj in which the 
frequencies were assumed constants in the brief interval 
corresponding to the discretization adopted (c^ 1 ms). 
The numerical absolute error for the fundamental fre- 
quencies was 10~^Hz. The initial values of fundamen- 
tal frequencies and harmonic intensities were extracted 
from an Fourier analysis of a small initial interval of the 
experimental sonogram. Strictly, the harmonic intensi- 
ties change with the frequency [/ = / (^)]- However this 
cj— dependence is noticeable only when the pitch of the 
sound emitted is close to the boundaries of the range (es- 
pecially upper limit). Therefore, considering that most 
of the exercises imply pitches in the medium region of the 
range, we assume the harmonic intensities as constants 

Figure [21 shows results for typical examples of tun- 
ing exercises and its corresponding simulation. We have 
drawn on left panel the experimental sonogram of the 
dynamical tuning and on right panel the results of the 
model given by Eqs. 1. Fig. [2Jl corresponds to a bari- 
tone and a mezzo-soprano who start with relatively near 
pitches {ujio = 187 Hz and UJ20 = 258 Hz) and after 2.5 
s they converge in a common note of intermediate value 
{00 = 219 Hz). Both experimental and theoretical results 
show an asymmetric dynamical evolution of each spec- 
trum in spite of the coupling is symmetric in this case 
{K21/K12 = 1). In FigEb the exercise for two tenors 
is drawn, one of them practically maintaining (by own 
decision) his pitch. Here we can observe as the initial in- 
terval means a type of consonance since there is a coinci- 
dence in harmonics of high order {0020/^10 = 220 Hz/ 185 
Hz:^ 6/5) but as the instruction is to move towards to the 
same note one of the singers changes his pitch up to lock 
all the spectrum with the other. In this case we reproduce 
the experimental evolution by adopting an asymmetric 
coupling {K21/K12 = 25). Figure [SI) is an example of a 
tuning in an octave for a baritone-soprano couple. The 
initial interval is a fifth {0020/^10 = 272 Hz/181 Hz:^ 3/2) 
and after the dynamical tuning they converge to an oc- 
tave with fundamental frequency of a; 10 = 146 Hz for 
the baritone. In this example the ratio between effective 
couplings is K21/K12 = — 1. In all the cases the time 
required for the dynamical tuning is about 1.5 s - 2 s in- 
dependently of the training or the quality of the singer. 

Theoretical results shown in Fig. El are very encourag- 
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FIG. 2: Tuning at the same note. Left panel: Experimen- 
tal sonogram corresponding to the dynamical tuning. Same 
colormap as in Fig. 1. Right panel: Results of the model 
given by Eq. (1) with m = 1. The value of the fundamen- 
tal frequencies and the relative intensities of the harmonics 
are taken from the Fourier Analysis of the interval Os - 0.5s. 
The drawing of simulations does not take into account the 
Fourier intensities, (a) a baritone and a mezzo-soprano with 
K21/K12 = 1. (b) The same as (a) for two tenors with 
K21/K12 = 25. (c) The same as (a) for a baritone and a 
soprano with K21/K12 = — 1. 



ing since they reproduce almost exactly the experimental 
records but it is worth to mention a word of caution. The 
minus sign in the relation between effective couplings of 
the simulation presented in Fig. Et seems to be non in- 
tuitive. However it is not a conceptual barrier since in 
this case we need that the fundamental frequencies go 
away one of another in order to tune in an octave. So, 
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FIG. 3: A tenor and a mezzo-soprano tuning in consonance. 
Left panel: Experimental sonogram. Same colormap as in 
Fig. 1. Right panel: Results of the model given by Eq. (1) 
with m — 2. The fundamental frequencies and the relative in- 
tensities of the harmonics are taken from the Fourier Analysis 
of the interval Os - 0.2s. In this case K21/K12 = —0.15. 



the minus sign changes the direction of the derivative fa- 
cilitating the movement of fundamentals in the correct 
sense. We remark that we have wanted to fit the exper- 
imental records with only one type of tti— like functions. 
In the context of this paper, the constant m works as a 
degree of freedom of the model. In many cases -mainly 
when there is not tuning at the same pitch- the model 
with m = 1 is not able to reproduce the experimental 
evidence although by fixing tti = 2 we recover a good 
agreement. Clearly, the stability domains in the time 
scale selected for the equation system change with m and 
then it is necessary to analyze what is the proper value of 
m for each case. This additional degree of freedom allow 
us to explore other possible solutions. Figure 3 shows 
an interesting situation in which we have changed the 
watchword asking to the singers ^Ho arrive to a pleasant 
sensation". Here we can study what consonance means 
for each couple since the watchword can be interpreted 
in a more subjective fashion. The example shown in Fig. 
IHl is part of an exercise lasting 12 s approximately in 
which the singers cross several stages of dynamical tun- 
ing. The sequence was firstly a fourth and then three 
different fifths, each one in a more comfortable sector 
of their ranges. We selected the first movement from a 
fourth {uj2o/^io = 420 Hz/319 Hz:^ 4/3) towards a fifth 
{uj2o/uJio = 425 Hz/283 Hz:^ 3/2). The record was re- 
produced by taking K21/K12 — —0.15 and m = 2. We 
notice that because his subjectivity the results emerging 
from this second watchword were diverse and very singer- 



dependent and we not always reached a good simulation. 

Conclusions. As a summary, in this work we propose 
a model describing a particular and predictable human 
behavior like the tuning process between singers. The 
calculations were done taking as input parameters the ex- 
perimental values of initial fundamental frequencies and 
harmonic intensities. We were able to reproduce almost 
exactly the dynamical evolution for several situations and 
we believe that this model containing the main features of 
the tuning process could be the starting point to further 
investigations in this field. 
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